Skip to content

Conversation

@dtcxzyw
Copy link
Owner

@dtcxzyw dtcxzyw commented Jun 11, 2025

Link: llvm/llvm-project#143745
Requested by: @nikic

@github-actions github-actions bot mentioned this pull request Jun 11, 2025
@dtcxzyw
Copy link
Owner Author

dtcxzyw commented Jun 11, 2025

Diff mode

runner: ariselab-64c-docker
baseline: llvm/llvm-project@0c62571
patch: llvm/llvm-project#143745
sha256: 5d368100ae3ef68ee72fdcc7010a603228e8a8d62d46557093d0826e1191cc8b
commit: 5d18fef

244 files changed, 470528 insertions(+), 474992 deletions(-)

Improvements:
  memcpyopt.NumMoveToCpy 16428 -> 16563 +0.82%
  dse.NumFastStores 1198143 -> 1198678 +0.04%
  dse.NumGetDomMemoryDefPassed 1408802 -> 1409305 +0.04%
  memcpyopt.NumMemCpyInstr 2027921 -> 2028546 +0.03%
  instcombine.NumDeadInst 45401612 -> 45402935 +0.00%
  dse.NumFastOther 517071 -> 517082 +0.00%
  simplifycfg.NumSinkCommonCode 389334 -> 389342 +0.00%
  simplifycfg.NumHoistCommonInstrs 2487836 -> 2487874 +0.00%
  instsimplify.NumSimplified 2622107 -> 2622137 +0.00%
  simplifycfg.NumSinkCommonInstrs 833795 -> 833803 +0.00%
Regressions:
  memcpyopt.NumStackMove 100798 -> 100746 -0.05%
  correlated-value-propagation.NumNonNull 12816238 -> 12814092 -0.02%
  mem2reg.NumLocalPromoted 593916 -> 593833 -0.01%
  memcpyopt.NumCallSlot 1181190 -> 1181042 -0.01%
  sroa.NumAllocaPartitionUses 290731979 -> 290721094 -0.00%
  capture-tracking.NumNotCapturedBefore 21736052 -> 21735333 -0.00%
  instcombine.NumCombined 131974382 -> 131970977 -0.00%
  sroa.NumDeleted 283480194 -> 283473229 -0.00%
  sroa.NumAllocaPartitions 83649354 -> 83647355 -0.00%
  sroa.NumAllocasAnalyzed 109312216 -> 109310445 -0.00%

6 18 bench/actix-rs/optimized/2o6s6qtmif526itx.ll
6 30 bench/actix-rs/optimized/409utvkjqyfhgg14.ll
1 5 bench/boost/optimized/union_issues.ll
5 30 bench/box2d/optimized/imgui_demo.ll
30 59 bench/delta-rs/optimized/11w0at10aiwuq3yr.ll
3 24 bench/delta-rs/optimized/2yom0llikg21u9sa.ll
3 5 bench/delta-rs/optimized/3qkwqfk85qralejq.ll
2 18 bench/delta-rs/optimized/43y2svfstmvqcl15.ll
17 21 bench/hyperscan/optimized/rose_build_anchored.ll
4 8 bench/image-rs/optimized/1clnprdgqfw2q9lq.ll
2 2 bench/llvm/optimized/BuiltinFunctionChecker.ll
201 211 bench/llvm/optimized/CGExprCXX.ll
3 3 bench/llvm/optimized/MachineDebugify.ll
64 68 bench/llvm/optimized/RISCVISelLowering.ll
6 6 bench/llvm/optimized/SelectionDAGISel.ll
3 7 bench/meilisearch-rs/optimized/2zqq886j9ovgawmv.ll
2 6 bench/meilisearch-rs/optimized/3f4k2xees4fvt0r.ll
2 10 bench/ockam-rs/optimized/2btxi82q4wq22oyk.ll
30 39 bench/ockam-rs/optimized/2qsd987rmmdpxbp7.ll
10 20 bench/ockam-rs/optimized/2tygv1xclgfmwb14.ll
2 4 bench/ockam-rs/optimized/4op0lm10vgcgt7cp.ll
28 40 bench/ockam-rs/optimized/53knze3nqsbtlge8.ll
32 43 bench/ockam-rs/optimized/g35wyrewxj969kp.ll
1 2 bench/pola-rs/optimized/4lreosyeqk7o1vd9fcfoxznlc.ll
55 67 bench/pola-rs/optimized/akny94jrhz4eylr1elklgkf62.ll
3 7 bench/pola-rs/optimized/dgtr4n6toyrs0lo6gtn8sd4wk.ll
12 12 bench/pola-rs/optimized/eo2lit9v8mg9048herjayt2j2.ll
31 39 bench/regex-rs/optimized/4dth5ncaqumdqgby.ll
1 3 bench/ruff-rs/optimized/bl7upda05f9py2dly725522mg.ll
62 49 bench/rust-analyzer-rs/optimized/150tm5mq81nfdpak.ll
32 45 bench/rust-analyzer-rs/optimized/1rhf3pjhhflazor1.ll
57 61 bench/rust-analyzer-rs/optimized/2ajuxklycdgazr2a.ll
6 14 bench/rust-analyzer-rs/optimized/2gfayp3e9bppz63d.ll
2 2 bench/tls-rs/optimized/1oa4q9ydtxtlathz.ll
1 1 bench/tls-rs/optimized/4vvnrvl2eceao62c.ll
4 12 bench/typst-rs/optimized/4m3ebbqd1xx21e5m.ll
3 3 bench/uv-rs/optimized/0mqbxjimtna5jl9558ukl0d23.ll
2 6 bench/uv-rs/optimized/1649wnv1wecv8ot0gji7der2b.ll
12 1 bench/uv-rs/optimized/26e8un8b0wh9yex2gcsrh3a8d.ll
1 9 bench/uv-rs/optimized/7raqa92m55m8lcbuewqxc24uw.ll
4 20 bench/uv-rs/optimized/9p6nc2blny0ou0zzic9idu916.ll
134 133 bench/wasmtime-rs/optimized/47hgs4eifsow3k34.ll
3 18 bench/yosys/optimized/microchip_dffopt.ll

@github-actions
Copy link
Contributor

Here is a brief summary of up to 5 major changes in the provided LLVM IR diff:

  1. Removal of SROA Alloca and Memcpy for Stack-Allocated Buffers
    In several functions, stack-allocated arrays (e.g., [61 x i8], [32 x i8]) are no longer created using alloca as part of Scalar Replacement of Aggregates (SROA) transformations. Instead of copying from temporary buffers into the final destination with memcpy, values are now directly accessed or copied from source to target without intermediate buffer allocation.

  2. Direct memcpy Between Source and Destination Pointers
    Previously, there were two memcpy calls: one to move data into an SROA-allocated buffer and another to transfer it to the final destination. Now, this has been simplified by eliminating the intermediate buffer and performing a single memcpy directly between the original source and final destination pointers.

  3. Cleanup of Lifetime Markers (llvm.lifetime.start/end) for Removed Allocations
    With the removal of certain SROA alloca buffers, associated lifetime markers have also been removed. This reflects that the intermediate allocations no longer exist and thus do not need to be tracked for sanitizer or optimizer purposes.

  4. Reduction in Unnecessary Phi Nodes and Control Flow Edges
    Some blocks contained phi nodes that tracked intermediate copies through multiple edges. These have been reduced or eliminated due to the simplification of memory transfers, leading to fewer control flow paths and simpler landing pad handling.

  5. Tail Call Optimization for memcpy
    Several memcpy calls are now marked with tail call, indicating that the compiler has optimized function calls to reuse the current function's stack frame when appropriate, reducing overhead during function unwinding or cleanup paths.


High-Level Overview

The overall change appears to be a result of optimization passes removing unnecessary stack allocations introduced by SROA (Scalar Replacement of Aggregates), replacing them with more direct memory operations. This leads to:

  • Less use of alloca for temporary buffers.
  • Direct memcpy usage instead of double-copying via intermediates.
  • Removal of related llvm.lifetime intrinsics since those temporary buffers noalias metadata no longer apply.
  • Simplified control flow and landing pad structures.
  • Use of tail call for better codegen and performance in cleanups.

These optimizations reduce stack usage and improve runtime efficiency by minimizing redundant memory operations.

model: qwen-plus-latest
CompletionUsage(completion_tokens=516, prompt_tokens=113125, total_tokens=113641, completion_tokens_details=None, prompt_tokens_details=None)

%.sroa.5.sroa.6.0..sroa.6.0..8.val.sroa_idx.i.sroa_idx.i = getelementptr inbounds nuw i8, ptr %.val, i64 24
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %.sroa.5.sroa.6.0..sroa.6.0..8.val.sroa_idx.i.sroa_idx.i, ptr noundef nonnull align 8 dereferenceable(16) %.sroa.532.i.i, i64 16, i1 false)
%.sroa.7.0..8.val.sroa_idx.i.i = getelementptr inbounds nuw i8, ptr %.val, i64 40
call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 8 dereferenceable(32) %.sroa.7.0..8.val.sroa_idx.i.i, ptr noundef nonnull align 8 dereferenceable(32) %.sroa.10.i, i64 32, i1 false), !noalias !12593
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regression.

@nikic
Copy link

nikic commented Jun 12, 2025

/add-label regression reviewed
/close

@nikic
Copy link

nikic commented Jun 12, 2025

/add-label regression
/add-label reviewed
/close

@github-actions github-actions bot closed this Jun 12, 2025
@dtcxzyw dtcxzyw deleted the test-run15594461165 branch June 12, 2025 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants